Building a test collection for speech-driven web retrieval
نویسندگان
چکیده
This paper describes a test collection (benchmark data) for retrieval systems driven by spoken queries. This collection was produced in the subtask of the NTCIR-3 Web retrieval task, which was performed in a TREC-style evaluation workshop. The search topics and document collection for the Web retrieval task were used to produce spoken queries and language models for speech recognition, respectively. We used this collection to evaluate the performance of our retrieval system. Experimental results showed that (a) the use of target documents for language modeling and (b) enhancement of the vocabulary size in speech recognition were effective in improving the system performance.
منابع مشابه
Experiments on Web Retrieval Driven by Spontaneously Spoken Queries
Motivated to realize the speech-driven information retrieval systems that accept spontaneously spoken queries, we developed a method to collect such speech data derived from the pre-defined search topics that had been systematically constructed for IR research. In order to evaluate both our method and the performance of the document retrieval by using the spontaneously spoken queries, we took p...
متن کاملA Speech-Driven Text Retrieval System and its Evaluation Using a Test Collection
To facilitate retrieving information with spoken queries, we propose a speechdriven text retrieval system. In past research, no attempt has been made to improve speech recognition in the context of speech-driven retrieval. In our system, a language model used for speech recognition is produced based on a target text collection, so that user queries associated with the collection can be recogniz...
متن کاملLanguage Modeling for Multi-Domain Speech-Driven Text Retrieval
We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing tes...
متن کاملSpeech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition
Speech recognition has of late become a practical technology for real world applications. Aiming at speech-driven text retrieval, which facilitates retrieving information with spoken queries, we propose a method to integrate speech recognition and retrieval methods. Since users speak contents related to a target collection, we adapt statistical language models used for speech recognition based ...
متن کاملCollecting Spontaneously Spoken Queries for Information Retrieval
Motivated to realize the speech-driven information retrieval systems that accept spontaneously spoken queries, we developed a method to collect such speech data derived from the pre-defined search topics that had been systematically constructed for IR research. In order to evaluate both our method and the performance of the document retrieval by using the spontaneously spoken queries, we took p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cs.CL/0309019 شماره
صفحات -
تاریخ انتشار 2003